Isolation and purification of the eighth gene of HTLV-III

ABSTRACT

The present invention is the isolation and purification of a newly discovered gene of the AIDS virus, HTLV-III, which encodes a protein which is immunogenic and recognized by sera of some HTLV-III seropositive people. Furthermore, the gene is highly conserved among all known HTLV-III isolates and exhibits a polymorphism at the 3&#39; end which distinguishes molecular clones of the HTLV-III cell line from viral genomes of related viruses (i.e., other HTLV-III isolates, LAV, ARV, etc.).

BACKGROUND OF THE INVENTION

The present invention is the isolation and purification of a newlydiscovered gene of the AIDS virus, HTLV-III, which encodes a proteinwhich is immunogenic and recognized by sera of some HTLV-IIIseropositive people. Furthermore, the gene is highly conserved among allknown HTLV-III isolates and exhibits a polymorphism at the 3' end whichdistinguishes molecular clones of the HTLV-IIIB cell line from viralgenomes of related viruses (i.e., other HTLV-III isolates, LAV, ARV,etc.). Also, the gene or the gene product(s) may be suitable targets forantiviral therapy.

Four distinct isolates of acquired immune deficiency syndrome (AIDS)virus have been previously characterized in detail:

HTLV-III_(RF) was obtained from a 25-year-old black Haitian man whoimmigrated to the United States in 1980. HTLV-III_(B) denotes a group ofvery related viruses obtained in 1983 from several different patientswith AIDS or ARC (AIDS related complex) from the New York City area in1983. LAV-la was obtained from a biopsied lymph node of a Frenchhomosexual man with lymphadenopathy syndrome who had over 50 differentsexual partners per year and had travelled to many countries includingthe United States. ARV-2 was isolated from the peripheral blood of ahomosexual man from San Francisco in 1984, one month before thediagnosis of AIDS was established. Representative clones comprising thefull-length genomes of each of these viruses have been described and thenucleotide sequence published (see, for example, Ratner et al., Nature,Vol. 313, pages 277-284,, 1985).

Human T-cell Lymphotropic Virus Type III (HTLV-III), the etiologicalagent of Acquired Immune Deficiency Syndrome (AIDS), is now knowngenerically as human immunodeficiency virus (HIV), is a member of theretrovirus family. However, the complexity of its genomic structure isunprecedented among retroviruses. In addition to the three structuralgenes (gag, pol, and env) in common with other retroviruses, fouradditional genes have already been identified (sor, 3'orf, and tat-III).Two of these, sor and 3'orf, were originally identified as open readingframes by nucleotide sequence studies and have been verified to encodeserologically reactive proteins of 23 kd and 27 kd, respectively. Bothof these genes appear to be dispensible for production of infectiouscytopathic virions, although mutants lacking the sor gene are greatlycompromised in the level of virus production. The transactivator gene(tat-III) was identified functionally by the capacity of its product toenhance expression of genes linked to the HTLV-III long terminal repeat(LTR). It is transcribed from three discontiguous segments of theHTLV-III genome into a 2.0 kb mRNA. The resultant protein (p14) isrequisite for virus replication, and its level of expression directlycorrelates with the level of virus proteins produced, but notnecessarily viral mRNA expression. Recently, a seventh viral geneproduct was found to be engendered by the same spliced mRNA as tat-III,but utilizes an alternate reading frame [Feinberg et al, Cell,46:807-817 (1986); and Sodroski et al, Nature, 321:412-417 (1986)]. Itis believed that the function of this gene is either to reverse anintrinsic block on the translation of HTLV-III gag and env mRNA intoproteins, or to regulate the relative amounts of HTLV-III genomic andspliced subgenomic mRNA. If the former function is correct, the genewill be named art (anti-repressor transactivator); if the latter iscorrect, the gene will be named trs (trans-acting regulator ofsplicing). While the major effects of both tat-III and art/trs areposttranscriptional, HTLV-III-infected cells also synthesize atranscriptional activator specific for transcription from its own LTR(Okamoto and Wong-Staal, Cell, in press). This viral factor, whoselocation on the viral genome is still unknown, may be distinct fromtat-III.

Inspection of the nucleotide sequences of diverse HTLV-III isolatesrevealed at least two other open reading frames that can potentiallyencode proteins of 80-100 amino acids. One of these, referred to as R,is the subject of the present invention, and is highly conserved amongHTLVIII isolates.

The HTLV-III genome is unusually complex for a retrovirus, possessing inaddition to the replicative genes (gag, pol, and env) at least threeextra genes (sor, tat, and 3'orf). Of these, the transactivator gene ofHTLV-III (tat-III) has been determined and shown to be critical to virusreplication. The sor and 3'orf genes, originally identified as openreading frames, have been shown to encode proteins which are immunogenicin vivo, but the function of these genes are, as yet, unknown.

Material Information Disclosure

Methods for the detection of HTLV-III antibodies are disclosed in Galloet al (4,520,113).

The discovery of the isolation and purification of the R gene has notyet been disclosed in any publication. However, the followingpublications are deemed pertinent:

Ratner et al, Nature, 313:277-284 (1985) discloses the nucleotidesequencing data of HTLV-III.

Muesing et al, Nature, 313:430-458 (1985) discloses the nucleotidesequences of the gag, pol, and env genes, as well as 7 exons of thegenome (including the sequences for the sor and 3'orf genes, originallyidentified as open reading frames).

Arya et al, Science, 229:69-73 (1985) discloses the tat-III gene.

Feinberg et al, Cell, 46:807-817 (1986) and Sodroski et al, Nature,321:412-417 (1986) disclose the 7th gene, art/trs (anti-repressortransactivator/trans-acting regulator of splicing).

DESCRIPTION OF THE FIGURES

FIG. 1 depicts the R gene in relation to the nucleotide sequences of theHTLV-III genome.

FIG. 2 is the Western immunoblot strip assay of serological detection ofthe R gene by patients' sera.

SPECIFIC DISCLOSURE OF THE INVENTION

The location of the R open reading frame in relation to the rest of theHTLV-III genome is shown in FIG. 1. The R gene overlaps with the sorgene, terminating before the first tat coding exon. Purified proteinproduct from the R gene is produced by cleaving the HTLV-III DNA cloneBH10 with Stu I and Sal I to produce a 379 bp Stu I-Sal I fragment. Thisfragment corresponds to nucleotides 5440-5777, using the numberingsystem described in Ratner et al, Nature, 313:277-284 (1985). The 379 bpfragment is then digested with Sau96 I to produce a 201 bp Sau96 I-Sal Ifragment. The 201 bp fragment is then purified and subcloned into anexpression vector. Expression vectors are well known by practitioners inthe art--examples of expression vectors may be found in Maniatis et al,Molecular Cloning, Cold Spring Harbor Laboratory (1982). The peptideexpressed from this expression vector is a 124 amino acid fusion proteincontaining 68 amino acids from the R region of HTLV-III, 43 amino acidresidues at the amino terminus from the vector, and 13 amino acids atthe carboxy terminus from the vector. The 68 amino acid derived from theR region represent all but 8 amino terminal and 2 carboxy terminalresidues of this reading frame. Cell lysate obtained from expressorclone R18 reveals a specific protein band migrating as a 15 kd protein.This protein is purified to 95% purity by one chromatography step on asephacryl S-300 column, followed by preparative SDS-polyacrylamide gelelectrophoresis.

As shown in Table 2, the R gene is highly conserved among the nineproviruses for which sequence information is available. This isparticularly true in the first 72 amino acids, where less than 9%divergence is observed even among isolates which differ by greater than10% in their gag genes and greater than 20% in their env genes. Thecarboxy terminus of the R gene exhibits another interesting phenomenon.All four clones derived from the HTLV-IIIB cell line (HXB-2, BH10, BH5,and H9_(pv)) contain a termination codon after amino acid 77. The restof the clones derived from LAV/BRU, ARV, HAT3, ELI and MAL all differfrom the HTLV-IIIB clones in amino acids 73-77 (due to a frame shiftwhich also results in an extension of 18 or 19 amino acids.

Nucleotide sequences of several HTLV-III/LAV isolates show amino acidsequence variability between viral isolates from different individualsand between sequential isolates from one individual. This variability isclustered in the envelope protein and particularly in gp120, and recentevidence indicates that some neutralizing antibodies are virustype-specific. For example, antisera to gp120 isolated from theHTLV-IIIB isolate does not neutralize the divergent RF isolate, whereasneutralizing antibody from an AIDS patient blocks infectivity of bothisolates. In addition, the presence of type-specific epitopes was shownby selection of resistant HTLV-III/LAV isolates by the passage of onesensitive isolate in the presence of neutralizing antibodies. Theresistant isolates are resistant to the selecting antibodies butsensitive to neutralizing antibodies from other patients.

Whereas the variable regions of the exterior envelope proteins of thedifferent HTLV-III/LAV viruses possess properties typical of antigenicepitopes, the conserved regions generally do not since they aregenerally hydrophobic and lack beta turns. An exception to this is astretch of 45 amino acids immediately 5' to the envelope precursor clipsite. This region is hydrophilic, contains numerous beta turns, and ishighly conserved in all four viruses; experiments have shown that it isboth immunogenic and broadly cross-reactive.

The finding of genomic variation primarily in exterior envelopesequences corresponding to predicted antigenic epitopes of differentAIDS viruses suggest that immune pressures may be important in theselection of variant viral strains. It has been shown [Gonda et al.,Science, Vol. 27, pages 173-177 (1985) and Shaw et al., Science, Vol.27, pages 177-182 (1985)]that HTLV-III is significantly related in itsnucleotide sequence, morphology, and biological behavior to visna virus,a member of the lentivirus family. This virus, as well as anotherlentivirus, equine infectious anemia virus (EIAV), appears to avoidelimination by host neutralizing antibodies as a result of divergence inits envelope glycoproteins.

One embodiment of the present invention is that the R gene is highlyconserved in the first 72 amino acids, and yet is polymorphic at thecarboxy terminus. In particular, this confers the ability of the R geneand the R gene product(s) to distinguish all clones of the HTLV-IIIBcells from all other existing virus isolates (or the clones derived fromthe isolates). The R gene is therefore suitable for use as anothertarget of antiviral therapy. Additionally, this shows that the HTLV-IIIB viruses and LAV are derived from distinct lineages.

In the preferred embodiment, a Stu I-Sal I fragment from the BH10 cloneof Human T-cell Lymphotropic Virus Type III is isolated and furtherdigested with Sau 96 I. The 201 bp Sau 96 I-Sal I fragment is purified,then treated with DNA polymerase (Klenow fragment) in the presence ofall four dNTP's, and ligated into the REV expression vector which hadbeen previously digested with Sma I [the expression vector was obtainedfrom Dr. S. Putney at Repligen, and described in Grayeb et al, DNA,5:93-99 (1986)]. The ligated DNA is used to transfect E. coli strainSB22l; a colony designated R18 contains the plasmid with the R genesequences in the correct orientation. As is readily apparent topractitioners in the art, many slight deviations in the process notedabove may be made without sacrificing the end result. Also, it will beapparent to practitioners in the art that other methods of separatingthe R gene may be used, now that the location of the gene is known.

One method of expression of the R protein in E. coli and subsequentpurification of the protein is disclosed in Example 3.

Any of a large number of available host cells may be used in thecombinations of this invention. The selection of a particular host isdependent upon a number of factors recognized by the art. These include,for example, compatability with the virus, toxicity of proteins encodedby the virus, ease of recovery of the desired virus or protein product,expression characteristics, bio-safety, and costs. A balance of thesefactors must be struck with the understanding that not all hosts may beequally effective for expression of a particular recombinant DNAmolecule.

Prior to the present invention, AIDS virus variants were propagated inan HT parental cell line, H9. This cell line, as well as other celllines capable of immortalizing AIDS virus variants are described inPopovic et al, Science, 224:497 (1984) and Sarngadharan et al, Science,224:506 (1984). Of critical concern is evidence that the Tcell tropismof the virus may be acquired behavior resulting from the continualpropagation of the virus in T4+ cells (particularly, H9 cells) both invitro and in vivo. Quantitative titration of an HTLV-III_(B) isolate onT-cells and M/M cells showed a 10,000-fold greater susceptibility ofOKT4+T-cells than M/M cells. Limited evidence suggests that cells otherthan T-lymphocytes can be infected by the virus. It has beendemonstrated that some rare B lymphocytes and a neoplastic cell line ofmonocyte-macrophage origin (both of which express T4 antigen) can beinfected in vitro with HTLV-III/LAV. See Montagnier et al, Science, Vol.225, p. 63 (1984); Dalgleish et al, Nature, Vol. 312, p. 763 (1984); andLevy et al, Virology, Vol. 147, p. 441 (1985). It has also been shownthat HTLV-III/LAV sequences exist within the DNA and messenger RNA ofbrain tissues recovered from neurosymptomatic AIDS patients.

Another embodiment of the present invention is the production ofantibodies, both polyclonal and monoclonal, which bind to the R genesequences. Antibody molecules appear in the blood serum of an animal orhuman in response to an injection of an antigen, a protein, or anothermacromolecule foreign to the host species. Accordingly, the presentinvention--the R gene, the protein encoded by the R gene, or segments ofthe R gene--may be used as an antigen in the production of antibodies.In the preferred method, rabbits produce the antibodies of the presentinvention by hyperimmunizing them with an R gene antigen.

Monoclonal antibodies which specifically bind to the R gene region arealso within the scope of the present invention. Using the well-knowntechnique of Kohler and Milstein, "immortal" clones of cells makingsingle antibody specificities are produced by fusing normalantibody-forming cells with an appropriate B-cell tumor line. Hybridomasare formed which may be selected out in a tissue culture medium whichfails to support growth of the parental cell types, or by successivedilutions o by plating out. The single clones thus produced are thenpropagated in spinner culture or grown in the ascitic form in mice.

One skilled in the art will also recognize that the present inventionmay be subject to a variety of processes well known in the art. Some ofthese include molecular cloning; immortalization in a suitable hostcell; detective, diagnostic, and prognostic test kits; or in theproduction of synthetic peptides.

DEFINITIONS

The following are the definitions of words used in the specification andclaims.

ANTIBODY: serum protein produced in response to an immunogen.

ANTIGEN: substance capable of reacting with its specific antibody.

CLONING VEHICLE (VECTOR): A plasmid, phage DNA or other DNA sequencewhich is able to replicate in a host cell, characterized by one or asmall number of endonuclease recognition or restriction sites at whichsuch DNA sequences may be cut in a determinable fashion withoutattendant loss of an essential biologiocal function of the

DNA, e.g., replication, production of coat proteins, loss of promoter orbinding sites, and which contain a marker suitable for identifying thetransformed cells (usually tetracycline resistance or ampicillinresistance).

DNA SEQUENCE: A linear array of nucleotides connected one to the otherby phosphodiester bonds between the 3' and 5' carbons of adjacentpentoses.

EXPRESSION: The process undergone by a structural gene to produce apolypeptide. This is a combination of transcription and translation.

EXPRESSION CONTROL SEQUENCE: A sequence of nucleotides that controls andregulated expression of genes when operatively linked to those genes.

GENOME: The entire DNA of a cell or virus. It includes the structuralgenes coding for the polypeptides of the substance, as well as operator,promoter, and ribosome binding and interaction sequences.

IMMUNOGEN: syn with antigen; more accurate--substance capable ofeliciting an immune response.

NEUCLEOTIDE: A monomeric unit of DNA or RNA consisting of a sugar moiety(pentose), a phosphate, and a nitrogenous hetrocyclic base. The base islinked to the sugar moiety via the glycosidic carbon (1' carbon of thepentose) and the combination of the base and the sugar is a nucleoside.The base characterizes the nucleotide. The four DNA bases are adenine(A), guanine (G), cytosine (C), and thymine (T). The four RNA bases areA, G, C, and uracil (U).

RECOMBINANT DNA MOLECULE OR HYBRID DNA: A molecule consisting ofsegments of DNA from different genomes (the entire DNA of a cell orvirus) which have been joined end-to-end outside of living cells andhave the capacity to infect a host cell and be maintained therein.

STRUCTURAL GENE: A DNA sequence which encodes through its template ormesenger RNA a sequence of amino acids characteristic of a specificpolypeptide.

TRANSCRIPTION: The process of producing mRNA from a structural gene.

TRANSLATION: The process of producing a polypeptide from mRNA.

EXAMPLES Example 1

Over one hundred sera samples were examined for the detection ofantibodies which specifically bind to the R 18 protein of the presentinvention. The results are summarized in Table 1. None of the 50 samplescollected from healthy individuals showed positive reactivity againstthe R 18 protein. In contrast, a significant percentage of sera samplesfrom known HTLV-III-infected individuals (as determined by serologicalreactivity against gag (p24) and/or env (gp4l) proteins) tested positiveagainst the R 18 protein. This result indicates that the R region of thegenome encodes a protein which is immunoreactive. When the positive serawere broken down into asymptomatic carriers, ARC and AIDS patients,there appears to be a decline in both the frequency and titer ofpositive sera with progression of the disease (see Table 1 and FIG. 2for representative Western Blot results).

Example 2

In an attempt to assess the role of the tat-III gene, one mutant wasconstructed with an EcoRI-Sal I deletion which removed the spliceacceptor site of the tatIII mRNA. As shown in FIG. 1, this mutant alsodeletes 15 amino acids on the carboxy end of the R gene. This mutant wasable to transcribe the appropriate viral mRNA species at normal levels,but was severely compromised at the level of virion production, aproperty which was attributed, at the time, to the reduced tat-IIIactivity. Indeed, complementation with bacterially expressed tat-IIIprotein partially restored virus expression. This result argues that theR gene, like the 3' orf gene, is not required for virus replication, butmay play a role in the in vivo pathogenesis of the virus.

Example 3

For purification of the R protein, 6 liters of R18 cells were culturedfor eight hours in L broth containing 100 ug/ml ampicillin. The cellswere harvested and resuspended in a volume of 50 mM Tris-HCl pH 8.0, 1mM PMSF equivalent to half their wet weight. The cell suspension wasthen treated with lysozyme (5mg/ml) for 1 hour at 37° C. and disruptedusing a Bead-Beater. The R18 protein present in the insoluble materialafter cell lysis was extracted in 50 mM Tris-HCl pH 8.0 containing 1 msodium chloride, 8M urea, and 10 mM dithiothreitol, and applied to asephacryl S-300 column. Fractions containing the R18 protein were pooledand concentrated. The R18 protein, 70% pure, was further purified bypreparative SDS-polyacrylamide gel electrophoresis. The gel purifiedprotein was >95% pure.

                  TABLE 1                                                         ______________________________________                                        Prevalence of Antibodies Against the                                          R-Gene Product of HTLV-III Infected Patients                                               Number      Number   Percent                                     Clinical Status                                                                            Tested      Positive Positive                                    ______________________________________                                        Healthy donors                                                                             50          0         0                                          Asymptomatic infected                                                                      15          7        47                                          patients*                                                                     ARC*         38          16       42                                          AIDS*        29          5        17                                          ______________________________________                                         *All these sera were positive for antibodies to p24 and/or gp 41         

                                      TABLE 2                                     __________________________________________________________________________    Alignment of the R-Gene Protein Sequence                                      Clone*.sup.1                                                                  __________________________________________________________________________    BH10 meqapedqgp                                                                           qrephnewtl                                                                          elleelknea                                                                          vrhfpriwlh                                                                          glygqhiyety                                     BY5         k                                                                 HXB2                                                                          H9pf                                                                          ARV2        y     r     p     sy                                              LAV                                                                           ELI  a      ya    s           s                                               MAL  a            q           s                                               HAT3        y     s     l     s                                               BH10 gdtwagveai                                                                           irilqqllfi                                                                          hfqnwvst*                                                   BH5               *                                                           HXB2              *                                                           H9pv              *                                                           ARV2              rigcqhsr                                                                            igiiqqrrqr                                                                          rngasrs*                                        LAV               rigcrhsr                                                                            igvtqqrrar                                            ngasrs*                                                                       ELI  v            rigcqhsr                                                                            igiirqrrar                                            ngssrs*                                                                       MAL  e      s     rigcqhsr                                                                            igitrqrrar                                            ngssrs*                                                                       HAT3              eigcqhsr                                                                            igitrqrrar                                            ngasrs*                                                                       __________________________________________________________________________     *BH10, BH5 Ratner et al, Nature, 313:277-284 (1985); HXB2, Ratner et al,      unpublished; H9pv Muesing et al, Nature, 313:430-458 (1985); ARV2             SanchezPescador et al, Science, 227:484-492 (1985); LAV WainHobson et al,     Cell, 40:9-17 (1985); ELI, MAL Alizon et al, Cell, 46:63-74 (1986); HAT3      Starcich et al, Cell, 45:637-648 (1986).                                      a, alanine; r, arginine; n, asparagine; d, aspartic acid; c, cysteine; q,     glutamine; e, glutamic acid; g, glycine; h, histidine; i, isoleucine; l,      leucine; k, lysine; m, methionine; f, phenylalanine; p, proline; s,           serine; t, threonine; w, tryptophan; y, tyrosine; v, valine.             

We claim:
 1. R gene of HTLV-IIIB as shown in FIG. 1A.
 2. A cloningcomprising the R gene of claim 1 vector.